Wearable information system having at least one camera

ABSTRACT

The invention is related to a wearable information system having at least one camera, the information system operable to have a low-power mode and a high power mode. The information system is configured such that the high-power mode is activated by a detection of at least one object in at least one field of view of the at least one camera.

BACKGROUND OF THE INVENTION

1. Technical Field

The invention is related to a method and of system capable of providingmultimedia information to a user at reduced battery consumption.

2. Background Information

Examples of standard approaches, their limitations and existingsolutions are provided below.

Smartphones, Audio Guides and similar information systems have becomepopular in recent years. Augmented Reality, as a new user interfaceparadigm, has seen great progress, especially based on computer visionalgorithms for object recognition and pose estimation. Head mounteddisplays, equipped with a camera have been known for some time (e.g.U.S. Pat. No. 7,245,273).

One major obstacle for the true success of ubiquitous informationsystems, which are able to always watch the user's surrounding forinteresting objects, is the high power-consumption of image processingalgorithms run on the application processor or the GPU..

The closest state of art we could find to our invention is U.S. Pat. No.7,302,089. The '089 Patent describes the idea of running a mobile devicein low power mode (standby) and high power mode (active). In low powermode, the camera may take a low-resolution image and match it against aknown symbol (e.g. the face of the user of the phone or an image). If inthe low-resolution image a known symbol is found, the phone may wake upand take a higher resolution image to verify the result and basicallyunlock the phone in order to take a call or similar things. The goal ofthe '089 Patent is to enable a power-efficient unlocking, based onimages. A scenario might be the phone is laying around and as soon as itreceives a call, it tries to check if the user gets in sight in order tounlock the screen.

SUMMARY OF THE INVENTION

What differs in our invention is first of all the purpose and possibleapplications, in that we are not trying to unlock a device, but we aretrying to provide information to the user about objects in the user'ssurrounding in a power efficient way. The present invention isespecially well suited to be used with head mounted displays and acamera pointed at the space in front of the user (e.g., as shown in FIG.12). A possible scenario could be the user walking through a museum thatexhibits 200 images, of which 20 are part of a guided tour. The userstarts the guided tour, e.g. as an application on his information systemand starts walking through the museum. After a certain time, the systemmoves to low-power mode. The user can now enjoy hours of walking throughthe museum, without worrying about his information system's battery.According to the present invention, the information is capable ofscanning the user's environment for interesting objects (e.g.interesting pieces in the exhibition). This can be done while consuminglittle power. As soon as an interesting piece comes into sight, thesystem can “wake up” and move to a high power mode, for example in orderto download interesting content and display it using Augmented Realty orin order to start an audio-clip, explaining the piece.

Another advantage of the invention is improved reaction time forapplications, like indoor navigation. The low-power modes allows to wakeup the system when it recognizes that new data has to be downloaded orwhen a new navigation model or a new computer vision model needs to bestored in memory. After preparing everything, the system can move to alow-power mode again. As soon as a waypoint comes into sight, the systemcan quickly power up and provide the user with relevant information,quickly. The user might also activate the system himself (e.g. when he'slost) and the system can immediately provide navigational information.

Different from the state of the art, aspects of the present method canprovide much more sophisticated detections algorithms at low powerconsumption (e.g., as compared to U.S. Pat. No. 7,302,089. Becauseaspects of the present invention works on higher level featuredescriptors and works on different image resolutions, it can handle muchbigger databases of objects and can detect those objects much morereliable. The objects can also be of arbitrary 3D shape.

Many tasks in processing of images taken by a camera, such as inaugmented reality applications and computer vision require findingpoints or features in multiple images of the same object or scene thatcorrespond to the same physical 3D surface. For example, in augmentedreality, the main problem is to determine the position and orientationof the camera with respect to the world (i.e., camera pose).

The standard approach to initialization of an optical tracking (i.e.when no knowledge from a previous frame is available) can be dividedinto three main building blocks: feature detection, feature description,and feature matching (e.g., see FIG. 1). As the skilled person willunderstand, if knowledge from a previous frame is not available, thatdoes not mean that knowledge from non-optical sensors, like GPS orcompass is not allowed. Feature detection is also referred to as featureextraction.

At first, feature detection is performed for identifying features in animage by means of a method that has a high repeatability. In otherwords, the probability is high that the method will chose the part in animage corresponding to the same physical 3D surface as a feature fordifferent viewpoints, different rotations and/or illumination settings(e.g. local feature descriptors as SIFT (e.g., see Lowe, David G.“Distinctive Image Features from Scale-Invariant Keypoints.”International Journal of Computer Vision 60.2 (2004) : 91-110;hereinafter referred to as “Lowe”), shape descriptors (e.g., see Bosch,A, Andrew Zisserman, and X Munoz. “Representing shape with a spatialpyramid kernel.” Image Processing 5 (2007) : 401-408; referred tohereinafter as “Bosch”) or other approaches known to the skilledperson). Features are usually extracted in scale space, i.e. atdifferent scales. Therefore, each feature has a repeatable scale inaddition to its two-dimensional position. In addition, a repeatableorientation (rotation) is computed from the intensities of the pixels ina region around the feature, e.g. as the dominant direction of intensitygradients.

Next, a feature descriptor is determined in order to enable thecomparison and matching of features. Common approaches use the computedscale and orientation of the feature to transform the coordinates of thefeature descriptor, which provides invariance to rotation and scale. Forinstance, the descriptor may be an n-dimensional real-numbered vector,which is constructed by concatenating histograms of functions of localimage intensities, such as gradients (as in Lowe). Alternatively, adescriptor might be an n-dimensional binary vector (e.g., as disclosedin Leutenegger, Stefan, Margarita Chli, and Roland Y. Siegwart. “BRISK:Binary robust invariant scalable keypoints.” Computer Vision (ICCV),2011 IEEE International Conference on. IEEE, 2011).

Finally, an important task is the feature matching. Given a currentfeature detected in and described from a current intensity image, thegoal is to find a feature that corresponds to the same physical 3D or 2Dsurface in a set of provided features that will be referred to asreference features. The simplest approach to feature matching is to findthe nearest neighbor of the current feature's descriptor by means ofexhaustive search and choose the corresponding reference feature asmatch. More advanced approaches employ spatial data structures in thedescriptor domain to speed up matching. Unfortunately, there is no knownmethod that would enable nearest neighbor search in high-dimensionalspaces, which is significantly faster than exhaustive search. That iswhy common approaches use approximate nearest neighbor search instead,e.g. enabled by space partitioning data structures such as kd-trees (SeeLowe).

FIG. 1 (in connection with FIG. 2) shows a flow chart of a standardmethod to match a set of current features with a set of referencefeatures. In step S11, a current image CI is provided taken with acapturing device. The next step S12 then detects and describes featuresin the current image CI (optional: already selective extractionaccording to estimated model-feature-positions), where every resultingcurrent feature c has a feature descriptor d(c) and a 2D position in thecamera image CI. Possible methods that could be used for featuredetection and description are explained in more detail below referringto exemplary implementations. A set of reference features r, each with adescriptor d(r) and optionally a (partial) position and/or orientationin a global coordinate system is provided in step S13. The referencefeatures can be extracted from reference images or 3D models or otherinformation about the object. Please note, that the position and/ororientation in a global coordinate system is optional in case of visualsearch and classification tasks. In step S14, the current features cfrom step S12 and the reference features r from step S13 are matched.For example, for every current feature the reference feature is searchedthat has the closest descriptor to the descriptor of the current featurewith respect to a certain distance measure. According to step S15, anapplication uses the feature matches, e.g. in order to estimate theposition and orientation of the capturing device very accurately in anaugmented reality application that integrates spatially registeredvirtual 3D objects into the camera image.

An example of an already proposed solutions includesa visual inertialtracking method disclosed in Bleser, Gabriele, and Didier Stricker.“Advanced tracking through efficient image processing andvisual-inertial sensor fusion.” Computers & Graphics 33.1 (2009) :59-72, that applies inertial sensors to measure the relative movement ofthe camera from the prior frame to the current frame. This knowledge isused for predicting the position and defining a 2D search space in theimage space for features that are tracked from frame to frame. Since thetechnique uses measurements of relative camera transformations only, itis not suited for the initialization of camera pose tracking or visualsearch tasks.

Therefore, it would be beneficial to provide a an information system andmethod of operating the same which enables a higher performance andhigher algorithmic flexibility at reduced processing and powerrequirements while performing visual computing tasks, thus enabling areduced battery consumption.

Aspects of the present invention are concerned with an informationsystem according to claim 1 and a method of operating an informationsystem.

According to an aspect of the invention, a method of matching imagefeatures with reference features comprises the following steps:providing a current image captured by a capturing device, providingreference features, wherein each of the reference features comprises atleast one reference feature descriptor, determining current features inthe current image and associating with each of the current features atleast one respective current feature descriptor, and matching thecurrent features with at least some of the reference features bydetermining a respective similarity measure between each respectivecurrent feature descriptor and each respective reference featuredescriptor, the determination of the similarity measure being performedon an integrated circuit by hardwired logic or configurable logic whichprocesses logical functions for determining the similarity measure.

According to the present invention, a new approach is proposed on howvisual computing tasks can be optimized and run more robust in real timeby implementing dedicated parts in hardware.

A further effect of the invention is to improve the initialization of anoptical tracking system based on pre-learned data (e.g., referencefeatures) in order to enable a higher performance at reduced processingand power requirements.

According to another aspect of the invention, there is provided anintegrated circuit for matching of image features with referencefeatures, comprising an interface for receiving a number of currentfeature descriptors of respective current features taken from a currentimage captured by a capturing device, an interface for receiving anumber of reference feature descriptors of respective referencefeatures, and a logic circuit for determining a respective similaritymeasure between each respective current feature descriptor and eachrespective reference feature descriptor for matching the currentfeatures with the reference features, wherein the logic circuitcomprises hardwired logic or configurable logic which processes logicalfunctions for determining the similarity measure.

In a preferred embodiment, our method is implemented on a specializedhardware block and only partially executed by a general purposeprocessor. The hardware block can of course be part of the sameintegrated circuit (also referred to as silicon or chip) as the generalpurpose processor.

In a preferred embodiment, the specialized hardware block is anon-programmable unit, wherein the term “programmable” refers toexecuting a dynamic sequence of general purpose instructions.

In a preferred embodiment the current image can be an intensity image ora depth image.

When we speak of intensity images throughout this disclosure, we referto images representing different amounts of light reflected from theenvironment, mostly depending on the environment's material and thelight situation. Intensity images can encode intensity in one channel(e.g. a greyscale channel) or in more than one channel (e.g. inRGB—red-green-blue channels) in different bit resolutions (e.g. 8 bit orhigh dynamic range).

There are several methods possible to provide a depth image or sparsedepth information comprising the depth of an element, e.g. a pixel or afeature, in an image which may be used in a matching process accordingto the present invention which will be described in the followingparagraphs.

According to an embodiment, to determine a depth of at least one elementin an intensity image, at least two capturing devices with knownrelative position and/or orientation each capture a respective intensityimage, wherein correspondences are found in the images and the relativeposition and/or orientation of the capturing devices is used tocalculate a depth of at least one element in the intensity images whichis part of at least one of the correspondences. In this case thematching process would be conducted in two general steps. First matchingfeatures of current frame one and current frame two in order tocalculate their depth information from a given pose between capturingdevice one and two. In a later step, the current features are thenmatched against reference features, taking advantage of the depthinformation or derived positional information during the matchingprocess.

According to another embodiment, to determine a depth of at least oneelement in an intensity image, at least one capturing device capturesintensity images at different points of time from different positions,wherein correspondences are found in the different images and a relativeposition and/or orientation of the capturing device between thedifferent images and a structure of the correspondences are recoveredand used to calculate a depth of at least one element in the intensityimages which is part of at least one of the correspondences. As in thecase above, the matching could again be conducted in several processes,matching recent image features with each other and then incorporatingthe additional information in a matching process against older referencefeatures.

According to another embodiment, to determine a depth of at least oneelement in an intensity image, there is provided at least one databaseof intensity images, wherein for each of the intensity images an overalldepth, or depth for at least one image region, or depth for one or morepixels is known and the intensity image captured by the capturing device(current intensity image) is matched against this database. The matchingresult is used to calculate a depth of at least one element in thecurrent intensity image.

According to another embodiment, to determine a depth of at least oneelement in an intensity image, there is provided an environment modeland information about a position and/or orientation of the capturingdevice when capturing the intensity image with respect to theenvironment model (which may be an initial estimation), wherein theenvironment model and the information about the position and/ororientation of the capturing device are combined and used to calculate adepth or a position estimate of at least one element in the intensityimage.

According to another embodiment, to determine a depth of at least oneelement in an intensity image, there is provided at least one sensor forretrieving depth information or range data and at least a relativeposition and/or orientation of the at least one sensor with respect tothe capturing device, wherein the depth information or range data isused to calculate a depth of at least one element in the intensityimage. Preferably, the pose (position and orientation) and intrinsicparameters of, both, the sensor and the capturing device are known.

According to an embodiment, the reference features are extracted from atleast one reference image which has been recorded with a secondcapturing device different from the capturing device. According to anembodiment, the capture time of the at least one reference image is atleast one day older than the capture time of the current image.

As a similarity measure according to the present invention, for example,a distance measure may be used. According to an embodiment, the methodof the invention may include determining at least one respective checkparameter by comparing the distance measure with at least one respectivethreshold, wherein the check parameter is used as a criterion todetermine whether the matching is performed or to influence the distancemeasure. If the matching is performed, the respective determinedsimilarity measure is used in the matching process.

According to an embodiment of the invention, calculations fordetermining the respective distance measure, or parts thereof, and arespective check parameter (as explained in more detail below) areperformed in parallel in a pipelined manner on the integrated circuit.For example, one respective distance measure and/or check parameter iscalculated per clock cycle of a clock signal of the integrated circuit.

In an aspect of the invention, after determining a respective similaritymeasure, the method further comprises storing a most similar and asecond most similar similarity measure from the similarity measuresdetermined until then, and an index of the respective current featuredescriptor associated with the most similar similarity measure.

According to an embodiment, the most similar similarity measure iscompared with a derivative of the second most similar similaritymeasure, wherein if this comparison fulfills a predetermined condition,the most similar and second most similar similarity measure, the index,and the associated reference feature descriptor are provided for furtherprocessing.

The method may further comprise determining from the computed distancemeasures a lowest distance measure and storing an index of therespective current feature descriptor for which the lowest distancemeasure has been determined.

According to an aspect, the method may further include storing thecurrent feature descriptors on a memory (such as SRAM) of the integratedcircuit which are retrieved from the memory without wait states.

According to an embodiment, the method further comprises the steps ofassociating with each of the current features at least one currentfeature descriptor vector, wherein each of the reference featurescomprises at least one reference feature descriptor vector, andcomputing a respective similarity measure between each of the referencefeature descriptor vectors of the reference features and each of thecurrent feature descriptor vectors of the current features.

In a possible implementation, at least a pixel of the current image isset as a respective current feature in the current image, i.e. everypixel of an image may represent a feature.

According to an embodiment of the invention, the method is run ondifferent resolutions of the image.

According to an embodiment of the invention, a first feature extractionprocess may be used to extract a first set of current features and afirst set of current feature descriptors is built for the first set ofcurrent features, and a second feature extraction process may be used toextract a second set of current features and a second set of currentfeature descriptors is built for the second set of current features,wherein the first feature extraction process and the second featureextraction process, or a first feature descriptor creation process and asecond feature descriptor creation process are different from eachother, and the feature matching process of the first and the second setof current feature descriptors are performed by hardwired logic orconfigurable logic. The matching process of the first and the second setare performed independently from each other ideally by hardwired logic.FIG. 3 shows how the overall process could look like, where 1 . . . ndenotes different feature extraction methods, 1 . . . m denotesdifferent feature descriptor processes and the different resultingdescriptor sets are matched and object detection takes place. All thiscan, according to the invention, take place in low-power mode andideally, in the low-power subsystem.

If a relevant object was detected, a high-power application canautomatically be started and offer relevant information to the user.Ideally, this can be an audio-guide or an augmented reality interface,for example as described in Miyashita, T., et al. “An augmented realitymuseum guide.” Proceedings of the 7th IEEE/ACM International Symposiumon Mixed and Augmented Reality. IEEE Computer Society, 2008.

In a further aspect, the method may comprise performing geometricverification after feature matching to remove incorrect feature matchesor to remove false positives in the case of classification. So in thereference database, many features are stored. Each feature correspondsto a class or pre-learned object. Depending on at least one of thenumber of matches between the current images features and a pre-learnedobject's feature and the distance measure of the matches, an object canbe assumed to be matched or more than one objects can assumed to becandidates. In both cases, the high-power-mode can be started.Alternatively, in case the low-power subsystem is capable of conductiongeometric verification, only after a successful geometric verificationof an object, the high-power-mode can be started.

According to an embodiment of the invention, the method may furthercomprise the step of providing a set of reference features, wherein eachof the reference features comprises at least one first parameter whichis at least partially indicative of a position and/or orientation of thereference feature with respect to a global coordinate system, whereinthe global coordinate system is an earth coordinate system or an objectcoordinate system, or which is at least partially indicative of aposition of the reference feature with respect to an altitude, the stepof associating with a respective current feature at least one secondparameter which is at least partially indicative of a position and/ororientation of the current feature with respect to the global coordinatesystem, or which is at least partially indicative of a position of thecurrent feature with respect to an altitude, and the step of matchingthe current feature with at least one of the reference features of theset of reference features by determining the similarity measure betweenthe at least one first parameter and the at least one second parameter.

For example, the method may include the step of defining a search spacewith a reduced number of reference features within the set of referencefeatures when matching the respective current feature, wherein thesearch space is determined based on the at least one second parameter.

According to an embodiment, the method may include the step ofconsidering indicators of the feature extraction process, for examplethe sign resulting from feature extractor. For example, the sign of aSURF feature corresponds to the sign of the Laplacian of Gaussian duringthe feature extraction.

According to an embodiment of the invention, in a method forconstructing a feature descriptor, feature points are extracted from theimage to gain feature points in a 2-dimensional description (parametersa0, a1) and the feature orientation is computed for the extractedfeature point using spatial information on the orientation of thecapturing device (parameters b0, b1, b2) provided by a tracking system.For example, the tracking system gives the orientation of the capturingdevice with respect to a world coordinate system as Euler angles andfeature descriptors are supposed to be aligned with the gravitationalforce. A very simple way to gain the orientation for all features is totransform the gravitational force to a coordinate system attached to thecapturing device using the Euler angles first and then project it ontothe image plane. Thereby, the direction of the gravitational force inthe image is computed and used for all features in the image. Thistechnique assumes orthogonal projection which is generally not the case.Incorporating the intrinsic parameters of the camera relaxes thisassumption but still all techniques based on 2D images assume everythingvisible in the image to lie on a plane and therefore are approximations.According to an embodiment of the invention, one or more directions ofthe at least one feature are computed based on pixel intensities ofneighbouring pixels and stored with respect to the common coordinatesystem. In the matching stage only features with similar directions withrespect to the common coordinate system are matched to reduce the numberof comparisons needed and decrease the ratio of false matches.

According to an aspect of the invention, at least one of the currentfeature descriptor or the reference feature descriptor is a higher leveldescription of an object, making it invariant to scale and/or rotationand/or light.

According to embodiments of the invention, the method may also include amethod of detecting and describing features from an intensity imagewhich is invariant to scale resulting from the distance between thecapturing device and the object, but is sensitive to the real (physical)scale of an object for a variety of applications. It is thus proposed toutilize the depth of an element in the intensity image (e.g. a pixel)for feature detection and/or description at that particular element(pixel) in an intensity image. Thereby, features can be detected anddescribed at real (physical) scale, providing an improveddistinctiveness compared to standard scale-invariant feature descriptorson intensity images without introducing any constraints on the cameramovement. In one embodiment, the method may comprise the steps ofproviding an intensity image captured by the camera, providing a methodfor determining a depth of at least one element in the intensity image,in a feature detection process detecting at least one feature in theintensity image, wherein the feature detection is performed byprocessing image intensity information of the intensity image at a scalewhich depends on the depth of at least one element in the intensityimage, and providing a feature descriptor of the at least one detectedfeature.

Measurements of position of the capturing device in a global coordinatesystem may be provided by a GPS sensor/receiver, IR or RFIDtriangulation, or by means of localization methods using a broadband orwireless infrastructure. Measurements of orientation of the capturingdevice in a global coordinate system may be provided by at least one ofan inertial sensor, an accelerometer, a gyroscope, a compass, or amechanical, electromagnetic, acoustic, or optical tracking system. Inthe context of the invention, an inertial sensor may, e.g. continuously,provide sensor information including the position and/or orientation ofan object or device with regard to the environment, by using anycombination of the following: magnetometer (e.g. a compass), motionsensor/rotation sensor (accelerometers/gyroscopes), gravity sensor, andother sensors providing such information.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments of the invention will be further described with reference tothe following Figures, in which:

FIG. 1 shows a flow chart of a standard method to match a set of currentfeatures with a set of reference features.

FIG. 2 is a depiction for illustrating detection, description andmatching of features in connection with FIG. 1.

FIG. 3 describes an embodiment of a process of feature matching and apossible application thereof where certain algorithmic building blocksare conducted at low power consumption and other parts are conducted athigh power consumption.

FIG. 4 shows an exemplary scene in which a method according to anembodiment of the invention is applied.

FIG. 5 shows a possible implementation of the determination of asimilarity measure in a matching process on an integrated circuitaccording to an embodiment of the invention.

FIG. 6 depicts another possible implementation of the determination of asimilarity measure in a matching process on an integrated circuitaccording to an embodiment of the invention.

FIG. 7 shows a flow chart of a general workflow of the process asdescribed with reference to FIGS. 5 and 6.

FIG. 8 describes the embodiment of a process of preparing the low-powermode, switching to low-power mode and switching to high-power modedepending on some process steps.

FIG. 9 shows a flow chart of a possible combination of a depthextraction mechanism with physical scale feature descriptors for the usein optical pose estimation according to an embodiment of the invention.

FIG. 10 depicts a flow chart of a method according to another embodimentof the invention where it is checked prior to matching whether a featuremay theoretically fit or not.

FIG. 11 shows the possible architecture of the low-power subsystem.

FIG. 12 shows a possible embodiment of the system.

FIG. 13 shows an overview of a possible embodiment of the overallprocessing system.

FIG. 14 shows another possible embodiment of the system.

DETAILED DESCRIPTION OF THE INVENTION

An initialization process has been briefly introduced in the beginningand is shown in FIG. 1. Running this process on application processingunits, usually requires clock-rates of over 1 GHz. Therefore, today,only short time usage of object detection applications on mobile devicesis possible, since the battery is emptied, quickly.

According to preferred embodiments, it is possible to classify objects(visual search process), which is the process of matching a currentimage with a previously generated class description, or to matchindividual features (feature matching process), which can then be used,ideally by the application processing unit to run a pose optimizationprocess. Keep in mind that visual search may be based on matchingseveral features per image. At the same time, the whole image might be afeature. Both approaches are supported by the present invention.

According to aspects of the present invention, both the visual searchprocess and the feature matching process can work with differentfeatures and feature descriptors present in the database and present inthe current image. In that case, the different features are extractedusing different feature extraction and/or feature description methodsand matched in two independent runs by the hardware unit (as indicatedin FIG. 3). For example, first SIFT features are extracted and matched,then SURF features are extracted and matched.

FIG. 3 describes a process of feature matching similarly as describedabove with reference to FIG. 1 and an application of the matching forpose estimation, rendering 3D objects or playing an audio file. Therendering may take place on a graphic processor unit (GPU).

One advantage of the invention is the possibility to leave outprocessing steps, which were necessary before or to run them in acompletely different advantageous configuration. For example, thefeature extraction process can be neglected or left out, creating a veryhigh number of descriptors. Instead, every pixel or a very high numberof randomly selected pixels may be chosen as the descriptor center. Inthis case, each pixel or each of the selected pixels is to be seen as afeature in terms of the present invention. Instead of choosing randompixels, a grid can be used to extract the descriptor centers, forexample, every 10^(th) pixel of a line, where every 10^(th) pixel row isanalyzed. The massive increase in features to match (ca. 10.000 featuresper image) resulted in an increase of successful initializations by 76%on a test-database of several thousand images.

Advantageously, it may be tested during the process whether a featurepair can theoretically fit or not. This may be achieved by checking thecurrent feature's estimated position against the reference feature'srecorded position. According to this aspect of the present invention, itis proposed to narrow the search space or influence the distance measurefor matching image features of a current image taken by a capturingdevice by considering the (partial) knowledge of their position in worldcoordinates (or global coordinates). A global coordinate system may bean earth coordinate system or an object coordinate system (e.g. abuilding or a product package or a car), which has a fixed altitude or afixed orientation related to earth's gravity. As the degrees of freedomof a feature's position that can be determined are heavily depending onthe available information on the position and orientation of thecapturing device, different exemplary implementations of aspects of thepresent invention are explained below with respect to FIGS. 4 and 10 inmore detail.

It is another aspect of the invention to take indicators of the featureextraction process into account, like the sign resulting from SURFfeature extractor (positive or negative Laplacian of Gaussian).

Another aspect of the invention not only takes into account the minimumdistance between two feature descriptors, but also the distance betweenthe minimum distance and the second best distance. Particularly, twodescriptors are considered as a match, if the second best distancemultiplied with a scalar factor smaller than 0.9 is bigger than the bestmatch's distance. This avoids the occurrence of false positives (e.g.,wrongly matched features), which will lead to wrong classifications orproblems in the pose estimation.

In another aspect of the invention, all current features of the currentimage are matched against each other, removing features, which are verysimilar to each other (the distance measure being below a certainthreshold). The filtered set of current features is then matched againstreference features.

The present invention is well suited for object classification. Thepresent invention is also well suited for camera pose initialization,where no or incomplete prior knowledge about the object's pose inrelation to the camera is available.

Feature detection:

A feature is a salient element in an image which can be a point, a line,a curve, a connected region or any other set of pixels. Also, a pixel,all pixels of an image, or each of a set of selected pixels may bedefined as a feature in terms of the present invention.

Feature detection algorithms are usually saliency detectors. Forexample, they find lines, edges, or local extrema of a differentialoperator. A feature detector can be seen as a function mapping a regionof pixels to a response. In the literature, this region is referred toas sampling window or measurement aperture of the feature detector. Theresponse is eventually thresholded to decide which elements are featuresand which are not. In order to extract features at a certain scale,either the sampling window can be scaled accordingly or the image isscaled before computing the response of the feature detector. The sizeof a feature is then defined as the size of the sampling window used todetect it.

Common examples for keypoint detection methods include Laplacian ofGaussian (LoG), Difference of Gaussians (DoG), Determinant of Hessian(DoH), Maximally stable extremal regions (MSER), Harris features, orlearning-based corner detectors such as FAST.

3D features also represent a possible data source for this invention. 3Dfeatures can be extracted from depth images or 3D models by manymethods, for example identifying local extrema.

In one aspect of the invention, the whole image may also be seen as afeature. In another aspect of the invention, the combination of 3D dataand intensity data can be used as input data, as for example describedin Wu, Changchang et al. “3D model matching with Viewpoint-InvariantPatches (VIP).” IEEE Conference on Computer Vision and PatternRecognition (2008) 0 (2008) : 1-8.

Feature/Image Description:

The visual features extracted (feature points, edges, corners, localextrema, etc.) need to be repeatable which means that their extractionshould be possible despite different viewpoints (orientation, scale,etc.), light conditions and/or image noise.

The matching process consists of finding at least one correspondingvisual feature which is extracted from two or more images. It oftenrequires the creation of descriptors that allow that the same physicalfeature in different images is described in a similar way with respectto some similarity or distance measure. An overview and comparison ofsome feature point descriptors is given in Mikolajczyk, K. and Schmid,C. ,, A Performance Evaluation of Local Descriptors.” 2005 IEEE Trans.Pattern Anal. Mach. Intell. 27, 10 (2005): 1615-1630. Once one ormultiple descriptors for every extracted feature are created, they arematched according to the similarity or distance measure: to everyfeature in the query image a match is assigned using nearest descriptoror based on the ratio test of Lowe.

Bosch describes a descriptor that represent local image shape and itsspatial layout, together with a spatial pyramid kernel.

Uchiyama, Hideaki, and Marchand, Eric. “Toward Augmenting EverythingDetecting and Tracking Geometrical Features on Planar Objects.” 2011International Symposium on Mixed and Augmented Reality (2011):17-25(referred to hereinafter as “Uchiyama”) describes a descriptor based onthe spatial relationship of features, which is also a possibility. Theapproach is to select n nearest neighbors of a point X in a set Pn.Select m<n points from Pm. Compute all possible invariants based on fpoints of m (f=5 for the cross ratio and f=4 for the affine invariant).The sequence of the invariants in a fixed order is one descriptor of thepoint X. The affine invariant is the ratio between two triangle areas:A(a,c,d)/A(a,b,c). The perspective invariant is the cross ratio oftriangle areas: (A(a,b,c)*A(a,d,e))/(A(a,b,d)*A(a,c,e)). In Uchiyama, ahashing process is used to match features, which could be left out,using our engine.

Taati, Babak: “ Generation and Optimization of Local Shape Descriptorsfor Point Matching in 3-D Surfaces.” Thesis (Ph.D, Electrical & ComputerEngineering)-Queen's University, Kingston, Ontario, Canada, August 2009,as an example, gives a good overview on 3D and depth-image baseddescriptors for matching.

Overall for this invention, a descriptor can advantageously be a vector,which is derived from a 2D image or a part of a 2D image or 3D data,which is created by not just transforming pixels into a different colorspace or normalizing their values. In another aspect of the invention,descriptors are derived from histograms, statistics or relativerelations on pixel, shape or depth values.

Matching Process:

The matching process is a key building block in the invention'ssolution. A possible layout according to an embodiment is shown in FIG.5. A possible process diagram is shown in FIG. 7. According to anembodiment, it combines the following calculations:

As a similarity measure according to the present invention, for example,a distance measure may be used. D(c, r) describes an advantageousdistance measure between two descriptors, according to our invention.Particularly, it describes a distance measure between a current featuredescriptor d(c) of a current feature c and a reference featuredescriptor d(r) of a reference feature r. For example, current featuresc and reference features r and their feature descriptors d(c) and d(r)are determined and provided, respectively, as described above withrespect to FIG. 1.

Generally, a respective distance measure D(c, r) may be determinedbetween one or more properties of the respective current feature cincluding the respective current feature descriptor d(c) and one or moreproperties of the respective reference feature r including therespective reference feature descriptor d(r).

The method of the invention may include determining a respective firstdistance measure Ad between each respective current feature descriptord(c) and each respective reference feature descriptor d(r) for thedetermination of the similarity measure D(c, r).

According to an embodiment, the method of the invention may includedetermining a respective second distance measure (here Δx and/or Δy)between position information x(c) and/or y(c) of the respective currentfeature descriptor d(c) in the current image and the respective positioninformation x(r), y(r) of the respective reference feature descriptord(r) in a common coordinate system for the determination of thesimilarity measure D(c, r). For example, this may be computed as theEuclidian distance between the 3D position information x(c) of therespective current feature described by d(c) and the 3D positioninformation x(r) of the respective reference feature described by d(r).

According to a further embodiment, the method of the invention mayinclude determining a respective third distance measure Δz indicative ofan angle between the position information z(c) of the respective currentfeature descriptor d(c) in the current image and the positioninformation z(r) of the respective reference feature descriptor d(r) ina common coordinate system for the determination of the similaritymeasure D(c, r). For example, this may be computed as the scalar productbetween a first vector z(c) defined by the camera center and the 3Dposition information of the respective current feature and a secondvector z(r) defined by the camera center and of the 3D positioninformation of the respective reference feature.

In another embodiment, Δz can be indicative of an angle between thecamera orientation in respect to a global coordinate system and anindividual directional property of a feature, e.g. derived from thesurface normal of a known surface on which the feature is located.

According to a further embodiment, the method of the invention mayinclude determining a respective fourth distance measure (here, Δuand/or Δv) between a scalar property u(c) and/or v(c) of the respectivecurrent feature descriptor d(c) in the current image and the respectivescalar property u(r), v(r) of the respective reference featuredescriptor d(r) for the determination of the similarity measure D(c, r).For example, this may be computed from the sign of SURF (positive ornegative Laplacian of Gaussian).

According to a further embodiment, the method of the invention mayinclude determining a respective combined distance measure D(c, r) forthe determination of the respective similarity measure by combining atleast one of the respective first, second, third and fourth distancemeasures with at least another of the respective first, second, thirdand fourth distance measures.

For example, D(c, r) can be the combination of Δu, Δv, Δx, Δy, Δz,and/or Δd.

P(c,r) describes another advantageous, optional part of the invention'smatching process. It may be used in a check, whether two descriptorsshould be matched at all. Mostly, this is helpful to avoid wrongmatches. P checks, if certain conditions are met, depending on giventhresholds.

According to an embodiment, the method of the invention may includedetermining a check parameter P, which is calculated in order todetermine whether a feature pair c, r with one of the current featuresand one of the reference features is eligible to be a valid match.

According to an embodiment, the method of the invention may includedetermining at least one respective check parameter P(c, r) by comparingat least one of the respective second distance measure Δx and/or Δy,third distance measure Δz and fourth distance measure Δu, Δv with atleast one respective threshold, wherein the check parameter P(c, r) isused to determine whether a feature pair c, r with one of the currentfeatures and one of the reference features is eligible to be a validmatch.

For example, the method may further include weighting at least one ofthe properties of the respective current feature c and reference featurer, or at least one of the distance measures between one or more of theproperties. Further, the method may include weighting at least one ofthe first, second, third and/or fourth distance measures whendetermining the combined distance measure D(c, r).

Particularly, each of the above described components can be given aweight (such as w_(u), w_(v), w_(x), etc.), which depends on theinformation available to the system. Information used here can beinformation coming from the feature extraction process or an estimationof the current feature's position in a global coordinate system or thecamera coordinate system (e.g. for stereo matching taking advantage ofepipolar geometry constraints). If this kind of information is notavailable, the respective weights in formula D(c,r) can be set to zeroor a value, for example depending on the information's uncertainty. Ifincomplete or no information about Δu, Δv, 66 x, Δy, Δz is given oravailable, the threshold values can be set to a very high value or bescaled, depending on uncertainty information.

According to an embodiment, the portions of the distance measure D(c, r)as described above, such as Δu, Δv, Δx, Δy, Δz, and Δd, can bedetermined as follows:

Δ u = (u(c) − u(r))² Δ v = v(c) − v(r)${\Delta \; x} = {\sum\limits_{i = 0}^{3}\left( {{x_{i}(c)} - {x_{i}(r)}} \right)^{2}}$${\Delta \; y} = {\sum\limits_{i = 0}^{3}\left( {{y_{i}(c)} - {y_{i}(r)}} \right)^{2}}$${\Delta \; z} = {\sum\limits_{i = 0}^{3}{{z_{i}(c)} \cdot {z_{i}(r)}}}$${\Delta \; d} = {\sum\limits_{i = 0}^{47}{{{d_{i}(c)} - {d_{i}(r)}}}}$

The given length of 48 for the feature descriptor shall be understood asa possible implementation of an embodiment and shall not be understoodas limiting the invention. Naturally, the length could be built longeror shorter. Similar, other or additional types of distance measures maybe computed and considered.

According to embodiments of the invention, as set out in the aboveformulas, calculating the respective distance measures may comprisecomputing sum-of-differences or sum-of-square-differences fordetermining the respective distance measure over a respective length ordimension (i).

In case of a binary descriptor, the distance measure may also comprisecomputing a Hamming-Distance.

According to an embodiment, the check parameter P(c, r) and the distancemeasure (D(c, r) can be determined as follows:

P(c,r)=Δu<θ _(u) ΛΔv<θ _(v) ΛΔx<θ _(x) ΛΔy<θ _(y) ΛΔz<θz

D(c, r)=w _(u) ·Δu+w _(v) ·Δv+w _(x) ·Δx+w _(y) ·Δy+w _(z) ·Δz+w _(d)·Δd

For example, the fields u, v, x, y, z and d, can be integer or floatingpoint storage units of arbitrary bit width. In one advantageousimplementation of the invention, the descriptor fields d_(i) are eachone byte long.

The hardware was specifically designed to solve the whole descriptormatching problem efficiently, not just accelerating the sum of absolutedifferences. If only a part is optimized, little performance gain isachieved, because of cache-misses etc. Therefore, the hardware includesits own memory (in FIG. 5: SRAM 6), loading the vector of currentdescriptors (current descriptors have been extracted from the currentimage).

With respect to the above described functions or steps of calculatingthe respective similarity measures, distance measures, combined distancemeasures, check parameters, etc., as set out above, the integratedcircuit according to the present invention includes a respective unit orunits implemented on the integrated circuit which perform the respectivefunctions or steps. Examples of such units are described in more detailbelow with reference to FIGS. 5 and 6. These examples, however, shallnot be understood as limiting the invention, as the skilled person willunderstand that there are multiple options of implementing the describedfunctions or steps according to the teachings of the invention inhardwired logic or configurable logic.

According to FIGS. 5 and 7, a possible implementation of thedetermination of the similarity measure in a matching process accordingto an embodiment of the invention is shown. The similarity measure isdetermined on an integrated circuit 1, which may be configured in anembodiment as shown in FIG. 5. Particularly, the integrated circuit 1includes hardwired logic or configurable logic which processes logicalfunctions for determining the similarity measure. One embodiment of theinvention runs as follows:

Via the peripheral interface 2, the host processor (not shown) accessesconfiguration registers 3 storing addresses, thresholds and weights(their usage is discussed later). Then it starts the operation bywriting to a virtual trigger register. The external memory interface 4reads the vector sets C (a number of current descriptor vectors c foundin the current image) and R (a number of reference descriptor vectors rcreated based on reference images) from an external DRAM. C iscompletely read into the internal SRAM 6 when the operation starts, asexplained above. Vectors from R are read one by one into the register 7with content “vector r”. Vectors from the SRAM 6 are then read one byone into the register 8 with content “vector c”. The unit 9 “subtract,multiply, add” calculates the intermediate values u,v,x,y,z,d asdiscussed above. In compare unit 10, these values are compared to thethresholds (“compare, and”) and weighted in unit 11 (“multiply, add”),yielding the values P(c,r) and D(c,r) as described above. In case morecurrent descriptor vectors c have been extracted from the current imagethan the SRAM 6 can hold at once, the current descriptor vectors may bedivided into two or more portions (c1, c2, . . . cn) and may be loadedinto the SRAM 6 and be processed by the integrated circuit 1 one afteranother

In unit 12, it is determined whether P is true. If P is true, then D iscompared to the values D1,D2 in register 13, which register 13 isupdated to contain the minimum value D1 and the second smallest value D2of the values D(c, r) determined until then, and the index c of theminimal value Dl is kept as cmin. After all vectors c from the SRAM 6are processed, the condition D1<t*D2 is checked in unit 14. In otherwords, it is determined whether the ratio of D1 and D2 falls below adefined threshold t from the configuration register 3 in order todetermine whether D1 is significantly smaller than D2. If the conditionis true, then a new tuple [r,cmin,D1,D2] is sent to the output buffer15. When the output buffer 15 is full, its content is written to anexternal memory via the external memory interface 4 and memory bus. Theoverall control of this process is performed by control unit 16.

FIG. 5 shows an overview of an embodiment of the components, whereasFIG. 7 shows the general workflow of the process as described above.

In addition to the matching component (FIG. 11, 1000), according to anembodiment of the invention, there can be more components that are partof the low power subsystem as shown in FIG. 11. A small CPU (1400), alsocalled engine control unit (ECU) might control the different specializedprocessing units. It might also take over some algorithmic tasks, likecreating descriptors of found features.

A scaling unit (1300) could generate images with scaled resolution fromthe original camera image or from scaled images. This can either help toreduce overall processing needs; e.g., by working on a smallerresolution image and additionally allowing the creation of scaleinvariant descriptors. Having multiple images, other processes couldalso be conducted in parallel on the different images.

Scale invariance could alternatively be achieved by having extractionand/or description algorithms, which work on different scales, e.g. byscaling the filter-size of a corner extraction process.

In an embodiment of the invention, one or more feature extractionprocessing units (1200) extract features from the at least one image orthe images at different scales. In one embodiment of the invention, atleast one descriptor generation processing unit (1100) buildsdescriptors, based on the features and the at least one image. Accordingto an embodiment of the invention, the different specialized processingunits (1100, 1200, 1300) are connected to a local memory unit (1500),which for example hold several buffers (e.g. of several image lines oreven a complete image). Via a direct memory access controller (1600) thelocal memory unit (1500) can exchange data with a storage area (e.g.dynamic memory) outside the subsystem (2100), according to oneembodiment. A host interface (1700), according to an embodiment, servesthe purpose of allowing the application processing unit to control andconfigure the subsystem. This is not processing intensive for anyapplication processing unit and can be done in high-power mode or inlow-power mode at low clock rates.

The components or at least one of the components as described above andtheir functions (also referred to as a hardware engine in the context ofthe present invention) are implemented on the integrated circuit byhardwired logic or configurable logic which processes logical functions.In other words, the functions to be performed in the detection process,as described above, may be implemented directly by means of acorresponding digital electronic circuit, particularly by means of ahardwired logic or configurable logic. Such electronic circuit may beimplemented in a flexible manner using an integrated circuit of thedigital technology, in which a desired logical circuit may beprogrammed. That is, for the integration of a function according to theinvention, as described above, an existing processing system, at theappropriate location, may be provided with or supplemented with at leasta programmable logical circuit, such as a PLD (Programmable LogicDevice) or an FPGA (Field Programmable Gated Array). Such a logicalcircuit may be implemented, for example, on an integrated circuit chipused, for instance, in a mobile device, such as a mobile telephone.

FIG. 13 shows an overview of a possible overall processing system,according to one embodiment. The processing system could be implementedas a system on a chip (SoC). The low-power subsystem (2100) is connectedvia a peripheral interface bus to at least one application processingunit, also called APU (2000). An APU could for example be an ARM CortexA9 CPU core. It is also possible, according to one embodiment, thatdepending on low power mode or high power mode, a different APU runs theoperating system, e.g. an ARM Cortex A9 in high power mode and a ARMCortex M in low-power mode. Another interconnect bus connects the lowpower subsystem to a dynamic memory controller, according to anembodiment. Via the peripheral interface bus, the APU can be connectedto peripherals, e.g. a gyroscope.

FIG. 6 shows an embodiment of a possible implementation of the matcher(1000) on an integrated circuit 20. This embodiment was developed inorder to not only handle short point-based descriptors or other shortdescriptors, but also longer descriptors e.g. shape-based descriptors,and it extends the above approach to work with longer vectors, e.g. 3000byte long. As far as the same components are used as in the embodimentof FIG. 5, the respective components are designated with the samereference numbers.

In this variant of the embodiment of FIG. 5, a register bank 21 thatholds a long vector r and two accumulator registers 22, 23 holdingrespective parameters Pacc and Dacc have been added. Only parts ci andri of long vectors c and r are compared at once. The functions Pi(c, r)and Di(c, r) are calculated incrementally and accumulated in Pacc andDacc. The final values P and D are then read from these registers 22, 23before D1 and D2 are updated as before. This extension allows thecomparison of much longer vectors with minimum extra hardware effort.FIG. 6 shows an overview of the extended engine's components.

Thus, according to an aspect of the invention, there is provided aregister bank that holds a reference feature descriptor vector R and twoaccumulator registers 22, 23 for holding a respective check parameter(Pacc) and a respective distance measure (Dacc), wherein only parts (ciand ri) of a respective current feature descriptor vector C andreference feature descriptor vector R are compared at once.

Depending on the available size of the SRAM 6 or the number of differentdescriptors used in the overall recognition/initialization pipeline, theengine can be started several times. For example, it can first find thebest matches between point based descriptors and then find the bestmatches for shape based descriptors, also using different thresholds andweights.

According to an embodiment of the invention, the calculations of P and Dare performed fully parallel in a pipelined manner For example, theoverall throughput is one vector comparison per clock cycle: With everycycle, a new vector c is read from the SRAM 6. After the pipeline hasbeen filled, the values of D1, D2, cmin are updated every cycle, too.The flexibility of the engine for computer vision tasks is also ofadvantage for finding a pose between two 3D point clouds, e.g. gatheredwith hardware, such as Microsoft Kinect.

FIG. 8 shows an overview of the method to switch between high-power andlow-power-mode. In S41 the application configures the system for itsneeds. In particular, reference features are provided andmultimedia-data might be downloaded. The system can then move tolow-power mode (s42). This can include lowering the applicationprocessing unit's clock-rate and turning off peripherals. In lowpower-mode, the system may watch for image trigger events (optionalS43). This could be for example waiting until the device movement,according to accelerometer data, is below a certain threshold and abovea second threshold, indicating the user is looking at an object, but hasnot put the device on a table, while drinking a coffee. If S43 is notimplemented, a camera image could be taken at a certain rhythm, e.g.every 100 ms.

In s44, at least one image is taken with a capturing device (in case ofa stereo-camera, two images could be taken simultaneously). In S45, theimage is analyzed and tried to matched against a database of referenceobjects. In case certain conditions are met, geometric verification isconducted on at least one candidate object. The geometric verificationcan be conducted in the low-power subsystem or by the applicationprocessing unit at low clock rates, according to one embodiment. If noobject was found, the system waits for another trigger event (S43) orsome time to repeat the process by taking a new picture (S44). If anobject has been found, the system moves to high-power-mode (wakes up) inS46. This can mean, increasing clock-rate of the main applicationprocessing unit, turning on the display and additional sensors andincreasing camera frame rate (e.g. to 30 Hz), according to anembodiment. Then, a high power application can be run (S47); e.g.providing audio data about the object or calculation position andorientation of the camera and the rigidly connected display in respectto the object in order to superimpose virtual objects.

The system may determine if it should move into low-power mode in S48.This might be determined from the user's action, e.g. waving the hand infront of the camera or by speech commands. Alternatively, the systemcould move to low-power mode, after it has presented all relevantinformation about the object, e.g. played the audio file or displayed ananimation sequence via augmented reality. The system could also simplymove into low-power mode after a certain time, e.g. 60 seconds. Thistime could start counting after the initial recognition, after theobject is not recognized in the camera any more or after themultimedia-data has finished playing.

FIG. 12 shows one possible hardware setup for an embodiment of theinvention. The user wears a display (300) attached to his head (400) infront of his eyes (500). The display (300) should be rigidly attached toa camera (100) with a field of view (600). The camera is pointingroughly in the user's viewing direction (200). The SoC (FIG. 13) may bepart of the camera and display unit or may be located in a device, e.g.smartphone, which is wirelessly connected to the head worn device.

Another possible hardware setup is shown in FIG. 14. The invention isespecially beneficial for a scenario where a user 3100 who wears aninformation system 3200 equipped with a camera walks through a museumthat exhibits images (as shown in FIG. 14). The wearable informationsystem 3200 is hanging on the chest of the user 3100 and the camera ofthe information system points at the space in front of the user. Theuser starts walking through the museum, while his information system isin a low-power model. The user can now enjoy hours of walking throughthe museum, without worrying about his information system's battery.According to the invention, the information system is capable ofscanning the user's environment for interesting objects (e.g. image3300). This can be done while consuming little power. As soon as animage 3300 comes into the field of view of the camera, the informationsystem can “wake up” and move to a high power mode, for example in orderto download interesting content related to image 3300 and display itusing Augmented Realty or in order to start an audio-clip, explainingimage 3300.

Applications:

FIG. 9 shows a possible use of a descriptor, relying on depthinformation, in order to give an example of a more complex embodiment ofthe invention.

According to aspects of the invention, a depth of an element, e.g. of apixel, in an image may be used as further information when matchingfeatures. Generally, the depth of an element in an image (e.g. pixel)may be defined as referring to the distance between the physical surfacethat is imaged in this element (pixel) and the capturing device,particularly the optical center of the capturing device.

FIG. 9 shows a possible combination of a depth extraction mechanism withphysical scale feature descriptors for the use in optical poseestimation, for example, in order to create outdoor AR experiences. Inthis example depth is extracted using rough sensor data and anenvironment model, as in FIG. 9.

In step S111, an intensity image I1 is captured by a capturing device orloaded. In addition, an initial pose of the capturing device whilecapturing I1 is estimated from rough sensor measurements such as GPSposition and orientation sensor information. Finally, an advancedenvironment model including 3D data and image data (similar to GoogleStreetview) is provided (step S112). Image data is only necessary, if areference model for tracking (e.g. already containing feature 3Dcoordinates and feature descriptors) has not been created in advance. Instep S113, the environment model is loaded using the assumed camera poseprovided by step S111, i.e. the environment model is rendered from thecamera viewpoint of intensity image I1. In step S114, depth informationis retrieved from the environment model and used in step S115 forcalculating the real scale descriptors of detected features. In otherwords, using the depth information registered with the image I1, realscale features are extracted at a fixed scale of, for example 1 m.Because the environment model combines 3D data and image data, areference 3D model of physical scale features with a scale of lm can becreated (S116, this can of course be done in advance).

Using an optimization algorithm, the refined pose of Il in theenvironment model's coordinate system can be calculated. The refinedpose can then be used for an application, e.g. an Augmented Realityvisualization of tourist data, or optionally be used to refine S111 anditerate through steps S111-S117, until the change in pose has gone belowa defined quality threshold.

The found feature matches can then be used for applications includingobject detection, object classification, object localization, andlocalization of the camera in the global coordinate system.

The latter, also referred to as “self-localization”, can for instance beperformed by means of robust pose estimation methods such as forinstance RANSAC, PROSAC or M-Estimators. Note that such methods requirean estimate of the intrinsic camera parameters, in particular the focallength. Depending on the available information on the position and/ororientation of the capturing device and the depth of pixels, differentpossible implementations of the inventive idea arise. They differ in thespatial constraints to narrow search space or P in the matching processdepending on the position and/or orientation of reference features thatare potential matches for a given current feature. Exemplary examples,that we consider particularly important, will be explained in detail inthe following.

Provided with a measurement of the gravity vector in a coordinate systemassociated to the capturing device, e.g. with inertial sensors, and thedepth of a current feature in the current camera image, e.g. by means ofa depth-from-stereo method, the method according to aspects of theinvention computes the relative or absolute altitude of this feature.

The 2D position of a feature in the image together with intrinsic cameraparameters enable defining a 3D ray in a coordinate system associated tothe capturing device. As in addition the depth of the feature may beknown, the feature's 3D position in the camera-aligned coordinate systemcan be computed. The vector from the optical center of the capturingdevice to the 3D feature position is then projected onto the normalizedgravity vector resulting in an altitude of the feature.

The method described above results in a relative altitude measure withrespect to the capturing device. To compute the absolute altitude of thefeature, the device's absolute altitude needs to be added. This can beeither measured, e.g. via GPS or a barometer, or can be based on anassumption as explained above.

FIG. 4 illustrates a possible implementation of this aspect of theinvention. Particularly, FIG. 4 shows a capturing device CD thatprovides a measurement of a gravity vector G in device coordinates (i.e.coordinates of the capturing device coordinate system) and the depth Dof a feature F1. Given the two pieces of information, the relativealtitude RA of the feature F1 with respect to the capturing device CDcan be computed. Particularly, the 2D position of the feature F1 in theimage together with intrinsic camera parameters enable defining a 3D rayin the coordinate system associated to the capturing device. As thedepth D of the feature F1 is known, the feature's 3D position in thecamera-aligned coordinate system can be computed. The vector from theoptical center of the capturing device CD to the 3D feature position offeature F1 is then projected onto the normalized gravity vectorresulting in the relative altitude RA of the feature F1. Adding the(absolute) altitude CDA of the capturing device CD results in theabsolute altitude AA of the feature F1. Analogous calculations can bemade for feature F2 to calculate its altitude.

The search space SS for a reference feature corresponding to the currentfeature F1 is then defined around its altitude AA. Note that in thisway, the reference feature F2 is not considered as a possible match,even though it looks very similar to F1, because it does not fall intothe search space SS. The search space can of course be controlledthrough the calculation of P in the proposed hardware engine. Thereby,the invention according to this aspect reduces the probability ofmismatches.

According to one aspect of the invention, a very large set of referencefeatures (e.g. billions or millions) is first reduced by a softwareapproach (e.g. using GPS data as input) to a smaller set (e.g. thousandsor hundred thousands), which are then matched using the hardware engine.

Although various embodiments are described herein with reference tocertain components or devices, any other configuration of components ordevices, as described herein or evident to the skilled person, can alsobe used when implementing any of these embodiments. Any of the devicesor components as described herein may be or may comprise a respectiveprocessing device (not explicitly shown), such as a microprocessor, forperforming some or more of the tasks as described herein. One or more ofthe processing tasks may be processed by one or more of the componentsor their processing devices which are communicating with each other,e.g. by a respective point to point communication or via a network, e.g.via a server computer.

What is claimed is: 1.-21. (canceled)
 22. An information system,comprising: a camera; a processor operatively coupled to the camera; adevice operatively coupled to the processor; and a memory deviceoperatively coupled to the camera, the processor and the device, thememory device comprising instructions executable by the processor to:obtain, in a low-power mode of the information system, an image capturedby the camera; extract, in the low-power mode, a first feature of anobject in the image; generate, in the low-power mode, a higher leveldescriptor of the first feature; cause, in the low-power mode, thedevice to determine that the higher level descriptor matches a referenceobject feature descriptor; and activate, in response to determining thehigher level descriptor matches the reference object feature descriptor,a high-power mode of the information system.
 23. The information systemof claim 22, wherein the device determines that at least one of thehigher level descriptors matches a reference object feature descriptorby: loading a plurality of reference object feature descriptors into amemory of the device; loading the higher level descriptor for the firstfeature; determining a distance measure between the higher leveldescriptor and each of the plurality of reference object featuredescriptors; and calculating a check parameter to determine whether thehigher level descriptor is a valid match for at least one of theplurality of reference object feature descriptors.
 24. The informationsystem of claim 22, wherein in the low-power mode a clock rate of theprocessor is lower than in the high-power mode.
 25. The informationsystem of claim 22, wherein the first feature comprises a point-feature.26. The information system according to claim 22, wherein the higherlevel descriptor comprises a scale-invariant feature descriptor.
 27. Theinformation system according to claim 22, wherein the higher leveldescriptor comprises a rotation-invariant feature descriptor.
 28. Theinformation system according to claim 22, further comprisinginstructions to cause the processor to display, in the high-power mode,augmented reality information related to the object.
 29. The informationsystem of claim 22, wherein the processor comprises one or moreprocessors.
 30. The information system of claim 29, wherein theinstructions to cause the one or more processors to extract a firstfeature of an object in the image comprise instructions to cause the oneor more processors to extract one or more features of the object in theimage.
 31. A computer readable medium comprising computer readable codeexecutable by a processor to: obtain, in a low-power mode of a system,an image captured by the camera; extract, in the low-power mode, a firstfeature of an object in the image; generate, in the low-power mode, ahigher level descriptor of the first feature; cause, in the low-powermode, a device to determine that the higher level descriptor matches areference object feature descriptor; and activate, in response todetermining the higher level descriptor matches the reference objectfeature descriptor, a high-power mode of the information system.
 32. Thecomputer readable medium of claim 31, wherein the device determines thatat least one of the higher level descriptors matches a reference objectfeature descriptor by: loading a plurality of reference object featuredescriptors into a memory of the device; loading the higher leveldescriptor for the first feature; determining a distance measure betweenthe higher level descriptor and each of the plurality of referenceobject feature descriptors; and calculating a check parameter todetermine whether the higher level descriptor is a valid match for atleast one of the plurality of reference object feature descriptors. 33.The computer readable medium of claim 31, wherein in the low-power modea clock rate of the processor is lower than in the high-power mode. 34.The computer readable medium of claim 31, wherein the first featurecomprises a point-feature.
 35. The computer readable medium of claim 31,wherein the higher level descriptor comprises a scale-invariant featuredescriptor.
 36. The computer readable medium of claim 31, wherein thehigher level descriptor comprises a rotation-invariant featuredescriptor.
 37. The computer readable medium of claim 31, furthercomprising computer readable code to cause the processor to display, inthe high-power mode, augmented reality information related to theobject.
 38. A method for managing a low-power and high-power mode of asystem, comprising: obtaining, in a low-power mode of a system, an imagecaptured by the camera; extracting, in the low-power mode, a firstfeature of an object in the image; generating, in the low-power mode, ahigher level descriptor of the first feature; causing, in the low-powermode, a device to determine that the higher level descriptor matches areference object feature descriptor; and activating, in response todetermining the higher level descriptor matches the reference objectfeature descriptor, a high-power mode of the information system.
 39. Themethod of claim 38, wherein the device determines that at least one ofthe higher level descriptors matches a reference object featuredescriptor by: loading a plurality of reference object featuredescriptors into a memory of the device; loading the higher leveldescriptor for the first feature; determining a distance measure betweenthe higher level descriptor and each of the plurality of referenceobject feature descriptors; and calculating a check parameter todetermine whether the higher level descriptor is a valid match for atleast one of the plurality of reference object feature descriptors. 40.The method of claim 38, wherein the higher level descriptor comprises ascale-invariant feature descriptor.
 41. The method of claim 38, furthercomprising displaying, in the high-power mode, augmented realityinformation related to the object.